Adjustments :- Adjustments sometimes leads to a simpler forecasting task. The purpose of these adjustments is to simplify the patterns in the historical data by removing the known sources of variation or by making the pattern more consistent across the whole dataset.

There are 4 kinds of adjustments :- calendar adjustments, population adjustments, inflation adjustments and mathematical transformation.

  1. Calender Adjusment :- Some of the variation in the data may be due to simple calender effects. For eg. if we are studying the monthly milk production on a farm, there will be variation between the months simply because of different no. of days in each month.

Below we are plotting the original series and series after the calendar adjustment.

library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth
dframe = cbind(Monthly = milk,
               DailyAverage = milk/monthdays(milk))

autoplot(dframe, facet = TRUE) +
  xlab("Years") + ylab("Pounds") +
  ggtitle("Milk Production per cow")

  1. Population Adjustment :- Any data that are affected by population changes can be adjusted to give per-capita data. That is, consider the data per person (or per thousand people, or per million people) rather than the total.

  2. Inflation Adjustment :- Data which are affected by the value of money are best adjusted before modelling. To make these adjustments, a price index is used. If \(z_t\) denotes the price index and \(y_t\) denotes the original house price in year t, then \(x_t = y_t /z_t ∗ z_{2000}\) gives the adjusted house price at year 2000 dollar values.

  3. Mathematical Transformation:- Transformations are important if the data shows variation that increase and decrease with the level of the series. Some examples of transformation is logarithmic transformation, power transformation etc. Log transformation are easiest to interpret. Changes in a log value are relative (or percentage) changes on the original scale. So if log base 10 is used, then an increase of 1 on the log scale corresponds to a multiplication of 10 on the original scale.

Power transformation can be written as \[w_t = y_{t}^{p}\].

Box-Cox Transformation:- This is a family of transformation that includes both log and power transformation. This transformation depends on â„·.

\[ w_t = log(y_t) if\ \ λ = 0\] \[w_t = (y_{t}^{λ} - 1) / λ \ \ otherwise\] If λ = 1, then \(w_t = y_t −1\) , so the transformed data is shifted downwards but there is no change in the shape of the time series. But for all other values of λ, the time series will change shape.

Let’s see how change in value of λ will effect our time series equation.

par(mfrow = c(3,2))
plot(log(elec), main = "Electricity Consumption US", ylab = "lambda = 0")
for (i in seq(0.1,0.5,0.1)){
  plot((elec^i - 1)/i, main = "Electricity Consumption US", ylab = paste("lambda = ", i))
}

par(mfrow = c(3,2))
for (i in seq(0.6,1,0.1)){
  plot((elec^i - 1)/i, main = "Electricity Consumption US", ylab = paste("lambda = ", i))
}

A good value of λ that makes the size of seasonal variation about the same across the whole series In the above graph values around 0.3 are working well. We can also identify the ideal value using boxcox.lambda function. But we don’t wants use the exact same value as provided by the function. We need to select a simpler value for λ but the function will help us with the nearby value that we need consider.

BoxCox.lambda(elec)
## [1] 0.2654076
lambda = BoxCox.lambda(elec)
autoplot(BoxCox(elec, lambda))

Having chosen a transformation, we need to forecast the transformed data. Then, we need to reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformation is given by \[ y_t = exp(w_t) if\ \ λ = 0\] \[y_t = (λw_t + 1)^{1/λ} \ otherwise\]

Features of transformation :- 1. If the time series values are negative than no power transformation is possible. We can adjust the values by adding a constant value. 2. The forecasting results are relatively insenstive to the value of λ. Transformation usually have large effect on prediction interval.

Another issue from the transformation is that we transform back to the orginal series, our point forecast won’t be the average of the forecast distribution but usually it is the median of the forecast distribution (assuming that the distribution on the transformed space is symmetric).

For the Box Cox transformation, the back transformed mean is given by :- \[y_t = exp(w_t)[1 + \frac{σ^{2}_{h}}{2}]\ if\ λ = 0\] \[y_t = (λw_t + 1)^{1/λ} \ [1 + \frac{σ^{2}_{h} (1-λ)}{2(λw_t + 1)^2}] \ \ \ otherwise\]

where \(σ^{2}_{h}\) is the h-step forecast variance.

The difference between the back transformed forecast and the mean is called bias. When we use the mean instead of median, we say the point forecasts have been bias-adjsuted.

We will see an example to see how much difference this bias-adjustment makes. Data :- Average annual price of eggs.

fc = rwf(eggs, drift = TRUE, lambda = 0, h = 50, level = 80)
fc2 = rwf(eggs, drift = TRUE, lambda = 0.3, h = 50,level = 80, biasadj = TRUE)

autoplot(eggs) +
  autolayer(fc, series = "Simple back transformation") +
  autolayer(fc2, series = "Bias adjusted", PI = FALSE) +
  guides(colour = guide_legend(title = "Forecast"))

The skewed forecast distribution pulls up the point forecast when we use the bias adjustment.

Excercise :- 1. Choosing the right Box-Cox transformation. a). Annual US net electricity generation.

autoplot(usnetelec) + 
  ggtitle("US net electricity generation") +
  xlab("Year") + ylab("billion kwh")

The above plot shows that the variation in time series is not varying that much initially but in a portion of data between 1970-1990 there is some increase in variance. It doesn’t looks like the transformation is going to fix but we can check it.

BoxCox.lambda(usnetelec)
## [1] 0.5167714

Now let’s plot for values 0.5 and 0.6.

autoplot((usnetelec^0.5-1)/0.5) + 
  ggtitle("US net electricity generation (lambda = 0.5)") +
  xlab("Year") + ylab("billion kwh")

autoplot((usnetelec^0.6-1)/0.6) + 
  ggtitle("US net electricity generation (lambda = 0.6)") +
  xlab("Year") + ylab("billion kwh")

Box Cox transformation is not helping much in this case.

b). Quarterly US GDP.

autoplot(usgdp) +
  ggtitle("Quarterly US GDP") +
  xlab("Quarter")

The time series is not increasing linearly but in a polynomial fashion which can be fixed using a tranformation.

BoxCox.lambda(usgdp)
## [1] 0.366352

Let’s plot for 0.3 and 0.4.

autoplot(BoxCox(usgdp,0.3)) + 
  ggtitle("Quarterly US GDP (lambda = 0.3)") +
  xlab("Quarter")

autoplot(BoxCox(usgdp,0.4)) + 
  ggtitle("Quarterly US GDP (lambda = 0.4)") +
  xlab("Quarter")

Any value around 0.36 would work in this situation.

c). Monthly Copper Price

autoplot(mcopper) +
  ggtitle("Monthly copper prices") +
  xlab("Month")

The above shows that variation in the time series is shifting with level but it’s not very consistent. We can see if box cox transformation can fox this.

BoxCox.lambda(mcopper)
## [1] 0.1919047

Let’s plot for 0.2 and 0.3.

autoplot(BoxCox(mcopper,0.2)) + 
  ggtitle("Monthly copper prices (lambda = 0.2)") +
  xlab("Month")

autoplot(BoxCox(mcopper,0.3)) + 
  ggtitle("Monthly copper prices (lambda = 0.3)") +
  xlab("Month")

Data has a reduced variance now.

d). Monthly US domestic enplanements

autoplot(enplanements) +
  ggtitle("Monthly US domestic enplanements") +
  xlab("Month")

The above time plot shows that variation is increasing with the level of the series.

BoxCox.lambda(enplanements)
## [1] -0.2269461
autoplot(BoxCox(enplanements,-0.2)) +
  ggtitle("Monthly US domestic enplanements (lambda = -0.2)") +
  xlab("Month")

It is visually very clear that the variation in the time series is more stable now.

  1. Montly Canadian Gas Production
autoplot(cangas) +
  ggtitle("Monthly Canadian Gas Production") +
  xlab("Month")

The time plot shows that variation is not constant across the series but it is not increasing with the level. Box-Cox transformation won’t be able to solve this issue. We can see it below as well.

BoxCox.lambda(cangas)
## [1] 0.5767759
autoplot(BoxCox(cangas,0.6)) +
  ggtitle("Monthly US domestic enplanements (lambda = -0.2)") +
  xlab("Month")

The shape of the series remains the same.

3). Retail data

retaildata = readxl::read_excel("/Users/ankittyagi/Desktop/Time Series Forecasting/Time-Series-Forecasting/Time_series_data/retail.xlsx",skip =1)
myts = ts(retaildata[,"A3349873A"], start = c(1984,4), frequency = 12)
autoplot(myts)

The retail data also shows increase in the variance as level of the series increase.

BoxCox.lambda(myts)
## [1] 0.1276369
autoplot(BoxCox(myts,0.1)) 

We can see that Box Cox transformation has stabilized the variance.

4). Unemployment benefits in Australia

autoplot(dole) +
  ggtitle("Unemployment benefits in Australia") +
  xlab("Month")

It looks like the variance is not increasing much with the level. But the Box-Cox transformation might help in visualizing the series much better.

BoxCox.lambda(dole)
## [1] 0.3290922
autoplot(BoxCox(dole,0.3)) 

Now we can see the series more clearly.

Accidental Deaths in USA :-

autoplot(usdeaths) +
  ggtitle("Accidental Deaths in USA") +
  xlab("Month") + ylab("No. of Deaths")

Looking at the graph it looks like no transformation is required.

autoplot(bricksq) +
  ggtitle("Quarterly clay brick production") +
  xlab("Quarter")

It looks like no transformation is required but we can still check if Box Cox is making the visualization more clear.

BoxCox.lambda(bricksq)
## [1] 0.2548929
autoplot(BoxCox(bricksq,0.2)) 

As we can see that transformation is not making any difference as well.